Search CORE

19 research outputs found

Interactive Teaching Algorithms for Inverse Reinforcement Learning

Author: Cevher Volkan
Devidze Rati
Kamalaruban Parameswaran
Singla Adish
Publication venue
Publication date: 01/01/2019
Field of study

We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the next demonstration based on learner's current policy. In particular, we design teaching algorithms for two concrete settings: an omniscient setting where a teacher has full knowledge about the learner's dynamics and a blackbox setting where the teacher has minimal knowledge. Then, we study a sequential variant of the popular MCE-IRL learner and prove convergence guarantees of our teaching algorithm in the omniscient setting. Extensive experiments with a car driving simulator environment show that the learning progress can be speeded up drastically as compared to an uninformative teacher.Comment: IJCAI'19 paper (extended version

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

MPG.PuRe

Transitions, Losses, and Re-parameterizations: Elements of Prediction Games

Author: Parameswaran Kamalaruban
Publication venue
Publication date: 20/05/2018
Field of study

This thesis presents some geometric insights into three different types of two-player prediction games – namely general learning task, prediction with expert advice, and online convex optimization. These games differ in the nature of the opponent (stochastic, adversarial, or intermediate), the order of the players' move, and the utility function. The insights shed some light on the understanding of the intrinsic barriers of the prediction problems and the design of computationally efficient learning algorithms with strong theoretical guarantees (such as generalizability, statistical consistency, and constant regret etc.). The main contributions of the thesis are: • Leveraging concepts from statistical decision theory, we develop a necessary toolkit for formalizing the prediction games mentioned above and quantifying the objective of them. • We investigate the cost-sensitive classification problem which is an instantiation of the general learning task, and demonstrate the hardness of this problem by producing the lower bounds on the minimax risk of it. Then we analyse the impact of imposing constraints (such as corruption level, and privacy requirements etc.) on the general learning task. This naturally leads us to further investigation of strong data processing inequalities which is a fundamental concept in information theory. Furthermore, by extending the hypothesis testing interpretation of standard privacy definitions, we propose an asymmetric (prioritized) privacy definition. • We study efficient merging schemes for prediction with expert advice problem and the geometric properties (mixability and exp-concavity) of the loss functions that guarantee constant regret bounds. As a result of our study, we construct two types of link functions (one using calculus approach and another using geometric approach) that can re-parameterize any binary mixable loss into an exp-concave loss. • We focus on some recent algorithms for online convex optimization, which exploit the easy nature of the data (such as sparsity, predictable sequences, and curved losses) in order to achieve better regret bound while ensuring the protection against the worst case scenario. We unify some of these existing techniques to obtain new update rules for the cases when these easy instances occur together, and analyse the regret bounds of them

arXiv.org e-Print Archive

The Australian National University